SCARPA: scaffolding reads with practical algorithms

نویسندگان

  • Nilgun Donmez
  • Michael Brudno
چکیده

MOTIVATION Scaffolding is the process of ordering and orienting contigs produced during genome assembly. Accurate scaffolding is essential for finishing draft assemblies, as it facilitates the costly and laborious procedures needed to fill in the gaps between contigs. Conventional formulations of the scaffolding problem are intractable, and most scaffolding programs rely on heuristic or approximate solutions, with potentially exponential running time. RESULTS We present SCARPA, a novel scaffolder, which combines fixed-parameter tractable and bounded algorithms with Linear Programming to produce near-optimal scaffolds. We test SCARPA on real datasets in addition to a simulated diploid genome and compare its performance with several state-of-the-art scaffolders. We show that SCARPA produces longer or similar length scaffolds that are highly accurate compared with other scaffolders. SCARPA is also capable of detecting misassembled contigs and reports them during scaffolding. AVAILABILITY SCARPA is open source and available from http://compbio.cs.toronto.edu/scarpa.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaffolding Large Genomes using Integer Linear Programming

The rapidly diminishing cost of genome sequencing is driving renewed interest in large scale genome sequencing programs such as Genome 10K (G10K). Despite renewed interest the assembly of large genomes from short reads is still an extremely resource intensive process. This work presents a scalable algorithms to create scaffolds, or ordered and oriented sets of assembled contigs, which is one pa...

متن کامل

The Role of Scaffolding in Nature School to Promote Sustainable Development in Education

Purpose: The overall purpose of this study was to investigate the role of Scaffolding in nature school in order to promote sustainable development in education. This research was practical in terms of purpose and survey in terms of descriptive method. Methodology: the present study is practical in terms of purpose and survey in terms of descriptive method. This statistical population consisted ...

متن کامل

ScaffMatch: Scaffolding Algorithm Based on Maximum Weight Matching

MOTIVATION Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding ...

متن کامل

Sequence analysis ScaffMatch: scaffolding algorithm based on maximum weight matching

Motivation: Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding...

متن کامل

Improved gap size estimation for scaffolding algorithms

MOTIVATION One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 4  شماره 

صفحات  -

تاریخ انتشار 2013